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We have systematically isolated and characterized DNA containing large CTQ 
(n>7) repeats from a human cosmid genomic DNA library. Using a CTG 10 
probe, more than 100 cosmid clones were identified, and 30 of these have been 
extensively characterized. The sequenced cosmids contain repeats that are 
between three and 19 perfect units (average 10 perfect repeats). The cosmids 
map to at least 12 different chromosomes. Sequence analysis of flanking 
regions suggests that more than one third of the repeats occur in exons, and 
many share strong sequence identity with databank sequences, including the 
gene involved in dentatorubral pall idol uysian atrophy (DRPLA). Genotyping of 
human DNA samples demonstrates that more than half of the repeats are 
polymorphic. This and similar collections of clones containing trinucleotide 
repeats should aid in the identification of genes that may contain expansions 
of trinucleotide repeats involved in human disease. 

Keywords: trinucleotide repeat; cosmid; fluorescent in situ hybridization 
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Introduction 

Pathological expansion of trinucleotide repeats 1 is 
responsible for several human diseases, including Hun- 
tington's chorea, 2 spinocerebellar ataxias, 3 spinal bul- 
bar muscular atrophy 4 and dentatorubral pallidoluysian 
atrophy (DRPLA) 5 and myotonic dystrophy. 6,7 In each 
of these disorders, normally polymorphic repetitive 
DNA regions of between 10 and 30 perfect CTG or 
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CAG units expand to greater than 40 units, resulting in 
disruptions of gene function. 

Previously, investigators have attempted to isolate 
and characterize segments of DNA containing large 
CTG repeats from cDNA rather than genomic librar- 
ies. 8-11 While all CTG repeats associated with disease 1 
should be represented, their isolation using cDNA 
libraries can be quite difficult for several reasons. First, 
low copy number, unstable, or tissue specific RNAs 
may be under-represented or completely absent from 
certain cDNA libraries. 12 Second, this approach of 
screening for cDNA will not identify trinucleotide 
containing regions in introns or in regions flanking 
genes. Third, trinucleotide repeats may not subclone in 



09/11/00 MON 12:48 FAX 614 447 . 



Analysis of CTG-repeat Containing Cosmid Clones 

RA Philibert el al 



the vectors commonly used to generate cDNA 
libraries. 

In order to circumvent these difficulties and to obtain 
as complete a representation as possible, we isolated 
and characterized trinucleotide repeats from a genomic 
DNA library and directly sequenced the cosmid geno- 
mic DNA inserts. 13 In this report, we describe our 
findings from the study of 30 trinucleotide repeat 
containing genomic clones. 



Materials and Methods 

The cosmid library was constructed from human DNA, 
partially digested with Sau 3A, ligated into the SuperCos I 
cosmid vector (Stratagene, La Jolla, CA), and packaged using 
the Gigapak II (Stratagene, La Jolla, CA). Positive colonies 
were identified by hybridization using the oligonucleotide 
probe CTG, 0 . 14 Briefly, three replicate cosmid colony lifts 
were prepared using ICN (Costa Mesa, CA) Biotrans 
membrane and then prehybridized for one hour in buffer 
(5 X SSPE, pH 7.0, 10 X Denhardt's solution, 0.05% SDS, 
and 10u.g/ml sheared E. coli DNA), and finally hybridized 
overnight in buffer (5 X SSPE, pH 7.0, 5 X Denhardt's 
solution, 0.1% SDS, and 10u.g/ml sheared E. coli DNA) 
containing 5' P 32 labeled CTG 10 probe. Individual sets of 
filters were then washed initially with two consecutive 5 min 
washes that were followed by more stringent washes at either 
60°C, 70°C or S0°C for 15 min in 6 X SSPE. Filters were then 
exposed to X-ray film (Kodak X-OMAT-AR) for approx- 
imately 16 h at -70°C. Cosmid DNA was prepared and 
sequenced using either manual radioactive or automated 
fluorescent methods as described previously. 13 

PCR Amplification 

PCR amplification of trinucleotide repeat containing DNA 
was performed using standard PCR buffer (10 mM Tris-HCl 
(pH8.3), 50 mM KC1, 0.001% gelatin, 2mM MgCl 2 , 200 um of 
each deoxynucleotide), 0.8 um primers and 10% DMSO. Taq 
polymerase and genomic DNA concentrations were 
2.5U/100u.l and 50ng/100ul, respectively. The thermal cycling 
parameters for amplification were: initial denaturation of 
95°C for 5 min, then 45 cycles of 95°C X 1 min, 65°C X 30 s, 
and 72°C X 2 min, followed by an final extension at 
72°C X 10 min. 

Polymorphism Analysis 

Polymorphism analysis was conducted using DNAs from 
more than 30 unrelated individuals. The PCR products were 
exposed to electrophoresis at 1700 volts for 2-3 h on a 6% 
denaturing polyacrylamide sequencing gel. The separated 
PCR products were then electroblotted on to a Hy-bond N + 
membrane (Amersham UK), hybridized overnight at 42°C in 
buffer (0.25 M NaCI, 0.125 M NaP0 4) 10% polyethylene 
glycol (MW6000), and 6% SDS) to a 32 P-labeled CTG 10 
probe, then washed, first at room temperature and then at 
37°C, for lh with wash buffer (2 X SSC/1% SDS). Filters 
were then exposed overnight to Kodak X-OMAT-AR film at 
-70°C. The size of PCR products was determined by 
comparison with DNA sequencing ladder DNA fragments. 



Sequence Analysis 

Sequence analysis was performed using the BLAST 15 and 
GRAIL (Oakridge National Laboratory) 1617 programs. Data- 
base comparisons and analyses were conducted on the cosmid 
DNA sequences with and without the trinucleotide repeat 
regions (see Results). 

Chromosome Localization and Subchromosome 
Localization 

Chromosomal assignment of the trinucleotide repeats was 
performed by PCR of somatic cell hybrid DNA (MPD-5000) 
from Bios Laboratories (New Haven, Connecticut). 

Target Material for Fluorescence in situ Hybridization 
(FISH) 

Peripheral blood lymphocytes were cultured according to 
standard protocols, and cells were treated with 5-bromodeox- 
yuridine (Brdu) at early replicating phase to induce banding 
pattern. 18 Slides were stained with Hoechst 33258 (1 |xg/ml) 
for 10 min and exposed to UV light (302 nm) for 30 min. 1 ' 
Before hybridization metaphase slides were pretreated with 
RNAse (100 ng/ml) and pepsin (20ug/ml). 

Probes for FISH 

CTG-containing cosmids were labeled with biotin 11-dUTP 
(Sigma Chemicals) by nick translation according to standard 
protocols (Nick Translation Kit, BRL). 

FISH 

The FISH procedure was carried out using 50% formamide, 
10% dextran sulfate in 2 X SSC as described earlier. 19 " 22 
Repetitive sequences were suppressed with 10-30 fold excess 
of COT-1 DNA (BRL, Gaithersburg, MD). After overnight 
incubation, nonspecific hybridization signals were eliminated 
by washing the slides with 50% formamide/2 X SSC, twice 
with 2 X SSC, and once with 0.5 X SSC at 45°C. Specific 
hybridization signals were visualized using FITC-conjugated 
Avidin (Vector Laboratories) and slides were counterstained 
with DAPI (4'-6'-diamino-2-phenyhndole)(0.025(j,g/ml). 
Only double spot signals were considered to be specific 
hybridizations. A multi-color image analysis was used for 
acquisition, display and quantification of hybridization signals 
of metaphase chromosomes. The system consists of a Photo- 
metrics PXL camera (Photometries Inc, Tucson, AZ) attached 
to a PowerMac7100/Av workstation. IPLab software controls 
the camera operation, image acquisition and Ludl wheel. 23 



Results 

From 800 000 human genomic cosmid clones screened 
with a 32 P-labeled CTG 10 probe, 100 cosmids with 
positive hybridization signals were purified, and 30 
were sequenced using the degenerate primer method. 13 
Of these, 22 repeat sequences were unique whereas 
eight were represented twice. The chromosomal localiz- 
ation, length of the trinucleotide repeat, the hetero- 
zygosity, as well as the PCR primer sequences used to 
amplify the repeat region are shown in Table 1. 
Although the repeats average almost 10 perfect repeat 
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units, some have additional short repeats adjacent to 
the CTG repeats. For example, CTG-1 and CTG-15, 
have large CAA or GAA repeats adjacent to the CTG 
repeat. The probability of each trinucleotide repeat 
being located within an exon was determined by the 
Gene Recognition and Analysis Interlink (GRAIL) 
Program 16 (Table 1). In order to avoid difficulties 
inherent in the analysis of repetitive DNA regions, the 
trinucleotide repeat was deleted before the GRAIL 
analyses. Despite the short length of many of the 
sequences submitted for GRAIL analysis, approx- 
imately one third of the sequences had a good or 
excellent probability of occurring in exons. 

These trinucleotide-depleted sequences were also 
submitted using the Basic Local Alignment Search Tool 
(BLAST) 15 for comparison to Genbank and Swiss Prot 
data banks. DNA in nine clones showed at least a mild 
degree (p<lCT 15 ) of sequence homology to database 
entries. Regions of cosmids CTG-37 and CTG-23 show 
almost complete sequence identity to a mouse open 
reading frame (ORF) encoding a central nervous 
system protein, while CTG-22 shows strong sequence 
identity with a region of beta-luteinizing hormone. 
Cosmid CTG-56 shows considerable sequence identity 
with wglA (EMB(X76569)), a previously isolated 
trinucleotide repeat. 24 Cosmid CTG-18 contains the 
genomic clone of the DRPLA cDNA clone. 3 CTG-86 is 
similar to CTG-B10, a trinucleotide-containing clone 
previously isolated from a human brain cDNA library. 9 
For the other 21 CTG repeat-containing clones, includ- 
ing five with a good or excellent probability of 
occurring in exons, no sequence homology to database 
entries was identified. 



Discussion 

Our findings suggest that the direct sequencing of 
genomic trinucleotide repeat-containing clones is use- 
ful for studying the involvement of these repetitive 
regions in human disease. With a few exceptions, 24-27 
previous attempts to characterize large CTG repeats 
have utilized cDNA libraries, 8,9 resulting in a bias 
toward over-represented, more clonable, and/or more 
abundant transcripts. This makes the isolation of the 
interesting, rare or less stable cDNAs difficult, and is in 
contrast to procedures using genomic libraries which 
tend to have a less biased representation of the total 
candidate gene pool. 

The direct sequencing of cosmid clones 13 has several 
advantages. First, the large trinucleotide repeats which 
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tend to be eliminated using smaller plasmids are more 
stable in cosmids. Second, analysis of the genomic DNA 
sequence surrounding the repeat allows us to deter- 
mine whether the repeat could be located within an 
exon. Third, the additional sequence available in a 
cosmid can be used to generate FISH probes, allowing 
for subchromosomal location of clone. The isolation of 
genomic trinucleotide repeats by subcloning filter 
hybridization enriched, PCR amplified, Mbo-I digested 
genomic fragments can be an alternative to generation 
of a primary library, 24 but these repetitive regions are 
often difficult to amplify, 28 resulting in the isolation of 
smaller, less GC-rich repeats that provide much less 
sequence information. 

Using an approach in which the repetitive CTG 
sequence is removed, GRAIL analyses indicated that at 
least one third of these sequences has good or excellent 
probability of being found in a coding exon. This may 
underestimate the frequency of ORFs since at least one 
sequence, CTG-18, which stands for part of the 
DRPLA locus, was not detected by this GRAIL 
analysis. This omission may have occurred because 
GRAIL sometimes fails to recognize coding exons less 
than 100 bp in length. In an analysis of genomic CTG 
repeat sequences obtained from GENBANK 29 Starlings 
concluded that one third of CTG repeats and almost all 
CAG repeats were located in exons. Our results are in 
good agreement with these previous findings. 

Comparison of the repeat sequences in our study 
with those in GENBANK demonstrates that several 
have significant sequence identity with previously 
described DNA sequences. The finding that CTG-18 is 
a partial genomic clone for the DRPLA cDNA 
illustrates the usefulness of this approach to search for 
trinucleotide repeats that may be involved in human 
disease. Both CTG-23 and CTG-37 have considerable 
sequence identity with different parts of murine ORF 
(D29801). Interestingly, GRAIL predicts that, like the 
CAG repeats from the mouse ORF (D29801), the 
repeats from CTG-23 and CTG-37 are exonic in 
humans. However, the murine repeats are much 
smaller, being only 2 or 3 CAG units in length. This 
suggests that the trinucleotide repeats on chromosome 
17 represented by CTG-23 and CTG-37 expanded after 
the divergence of human and mouse genomes. 

With two exceptions, CTG-1 1 and CTG-17, the FISH 
data confirm the somatic cell PCR localization results. 
Two of the repeat-containing cosmids, CTG-74 and 
CTG-15 map by FISH to two distinct loci. This 
observation may result from the presence of multiple 
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copies of these trinucleotide repeats or suggest the 
presence of a gene family of related sequences. This is 
not surprising since at least one repeat, CTG-47 gives 
four allele fragments on PCR amplification of human 
genomic DNA. However, unlike CTG-74 and CTG-15, 
chromosome localization performed using somatic cell 
hybrids suggests that all the loci encoding CTG-47 
repeat sequence are on chromosome 7. 

In summary, we demonstrate that direct sequencing 
of cosmid clones from a genomic library is a useful 
approach to isolating and characterizing DNA 
sequences containing trinucleotide repeats that could 
be involved in human disease. The chromosomal and 
sub-chromosomal localization data presented here pro- 
vide sequences that may help to identify candidate 
genes for diseases mapping nearby or in yet to be 
localized syndromes. 
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